Grouping methods for ongoing record linkage
نویسندگان
چکیده
The grouping of record-pairs to determine which records belong to the same individual is an important part of the record linkage process. While a merge grouping approach is commonly used, other methods may be more appropriate when linking to a repository of previously linked data. In this paper, we applied a number of grouping strategies to three large scale hospital datasets (comprising around 27 million records), each with a known truth set. These datasets were linked against a created ‘repository’ whose quality was varied. Experimental results show that alternate grouping methods can yield very large benefits in linkage quality, especially when the quality of the underlying repository is high. Best link methods can remove between 25-90% of matching errors, depending on the characteristics of the underlying datasets.
منابع مشابه
Probabilistic Linkage of Persian Record with Missing Data
Extended Abstract. When the comprehensive information about a topic is scattered among two or more data sets, using only one of those data sets would lead to information loss available in other data sets. Hence, it is necessary to integrate scattered information to a comprehensive unique data set. On the other hand, sometimes we are interested in recognition of duplications in a data set. The i...
متن کاملA Grid and Cloud Based System for Data Grouping Computation and Online Service
Record linkage deals with finding records that identify the same real world entity, such as an individual or a business, from a given file or set of files. Record linkage problem is also referred to as the entity resolution or record recognition problem. To locate those records identifying the same real world entity, in principle, pairwise record analyses have to be performed among all records....
متن کاملValidation of de-identified record linkage to ascertain hospital admissions in a cohort study
BACKGROUND Cohort studies can provide valuable evidence of cause and effect relationships but are subject to loss of participants over time, limiting the validity of findings. Computerised record linkage offers a passive and ongoing method of obtaining health outcomes from existing routinely collected data sources. However, the quality of record linkage is reliant upon the availability and accu...
متن کاملScalable Event-Based Clustering of Social Media Via Record Linkage Techniques
We tackle the problem of grouping content available in social media applications such as Flickr, Youtube, Panoramino etc. into clusters of documents describing the same event. This task has been referred to as event identification before. We present a new formalization of the event identification task as a record linkage problem and show that this formulation leads to a principled and highly ef...
متن کاملBlocking Methods Applied to Casualty Records from the Syrian Conflict
Estimation of death counts and associated standard errors is of great importance in armed conflict such as the ongoing violence in Syria, as well as historical conflicts in Guatemala, Perú, Colombia, Timor Leste, and Kosovo. For example, statistical estimates of death counts were cited as important evidence in the trial of General Efráın Ŕıos Montt for acts of genocide in Guatemala. Estimation ...
متن کامل